-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(snap): avoid refresh on package_upgrade: true and refresh.hold #5426
feat(snap): avoid refresh on package_upgrade: true and refresh.hold #5426
Conversation
3b55abd
to
3bf177d
Compare
This looks good; thanks for including an example in the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left one small question but not going to block on it.
package_upgrade: true | ||
snap: | ||
commands: | ||
00: snap refresh --hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a comment or something that indicates that by omitting any duration, snap refresh --hold
is how you set it to "forever"? For people more familiar with snap this might be obvious, but when i read To avoid calling
snap refresh in images with snap installed, setting snap refresh.hold to forever
, my instinct would be to put snap refresh --hold forever
or something like that. Just a thought. Maybe it's overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments.
["snap", "get", "system", "refresh.hold"], rcs=[0, 1] | ||
) | ||
|
||
if out.strip() == "forever": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I want snap holds enabled, but not by cloud-init, how would I do that with this feature?
If I set snap refresh --hold=15m
, cloud-init will still attempt the refresh.
I notice that snap returns 1 if this key isn't set. Also, I see what looks like a machine-readable flag -d
, if parsing is really required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holmanb this is a good comment, thanks! We should support time-based holds as well as forever. Just added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to revert this element because snap daemon itself sets a refresh.hold during early boot, so cloud-init can't distinguish between the snap deamon injected refresh.hold window and something set by user-data.
390ceb7
to
8ebaeb1
Compare
8ebaeb1
to
926be8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, Chad! Just left one small grammar nit but otherwise, looks good to go to me.
try: | ||
result = subp.subp(command) | ||
snap_hold = ( | ||
util.load_json(result.stdout).get("refresh", {}).get("hold") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I start a new lxd container, I see a refresh hold immediately upon starting the container. I think a refresh hold exists regardless if the user has specified one and just points to the next time snap will refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, yes you are correct. Just saw this on jammy and oracular images. Ok, we can't approach this feature from this angle. If someone wants temporary snap hold during boot, They can add a runcmd: snap refresh --unhold to user-data. I'll update docs and avoid the time-based hold since it looks to be set during boot anyway that ties directly to the Jul 10 22:33:58.917075 test-j systemd[1]: Started Snap Daemon.
getting started in journalctl.
926be8c
to
8a68069
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Brett's original point, the behavior here doesn't feel entirely consistent, but I don't really see a better way of approaching it.
I left one inline nit, but it's not blocking.
@@ -0,0 +1,8 @@ | |||
#cloud-config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It seems a bit odd to have 2 of our 3 examples dedicated to a workaround for a single package manager on a single OS. I don't think we really need an example showing how to unhold it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me. I'll drop that example. of needed it is a corner case for sure.
When snap refresh.hold is set to forever, an admin is saying they do not want generic automated refreshes of snaps performed by default. This should be an indicator to cloud-init to avoid calling snap refresh on such systems due to a `package_upgrade: true` present in user-data. For network-limited environments with images which have the snap package manager but don't want to wait and timeout on snap refresh, the following user-data can be provided to still allow for package_upgrade: true, and avoid a 20-30 second wait on snaps being unable to access certain snap URLs. #cloud-config package_upgrade: true snap: commands: 00: snap refresh --hold=forever cloud-init now interrogates the state refresh.hold value by calling snap get system -d If snap refresh --hold was called in that environment to set 'forever', cloud-init will skip calling refresh and log the reason for skipping. We cannot honor short time-based refresh.holds because the snap services place a short hold in early boot anyway as systemd units startup. Fixes: canonicalGH-5290
8a68069
to
3dbdade
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on a solution to this issue @blackboxsw.
It may solve the problem for the filed bug, but the UX isn't beautiful, and the implementation isn't elegant either - it requires a pretty major layering violation, depends on snap behaviors that are not well documented, and puts the responsibility on the user to figure this out from the examples - a needle in the haystack. This solution may work - and it has the huge benefit of being far simpler than a full redesign, but I'd rather that this isn't the end goal for the UI.
I think that it would ultimately make a lot more sense for cloud-init to stop having a unified UX for all package managers and provide more tasteful ones for ones like snap and apt individually.
I think that a configuration format like this
apt:
upgrade: true
snap:
upgrade: true
would be far more obvious and intuitive to reason about for an end user than
package_upgrade: true
snap:
commands:
00: snap refresh --hold=forever
Thoughts?
package_upgrade: true | ||
snap: | ||
commands: | ||
00: snap refresh --hold=forever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a workaround, we should include a comment explaining why someone might want this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do include that comment in data.yaml for example2 which is rendered above the example in the modules documentation
- comment: "By default,
package_upgrade: truewill trigger upgrades on any installed package manager. To avoid calling
snap refreshin images with snap installed, set snap refresh.hold to
forever will prevent cloud-init's snap interaction during any boot"
. I'd updated this comment and the general desciprtion of the module to clarify that it interacts with any installed package managers which include apt/pkg/yum/zypper/snap
While I still think that the above would make the most sense long term, I appreciate that the alternative that I've proposed is a significant chunk of work on top of what has already been done, and I don't want to block a workaround from existing for the short term. I'm fine with merging your proposed solution and filing an issue to redesign the UI at some later date when we have time to design a more intentional UI. |
Thanks a lot @holmanb. While I agree with your sentiment that allowing cloud-config syntax to be more specific when enabling/disabling upgrade interaction based on specific package mangers would be cleaner, I think there are a couple of reasons I wouldn't want to go this route in cloud-init:
I don't want to block the idea outright, but I also don't really want to prioritize this specialization of config keys where possible unless we think it's handling a significant/frequent use case as I feel it potential dilutes the opinionated default behavior of cloud-init. |
Using the same configuration across different package managers is itself a minority use case, yet this was chosen as a key design point for cloud-init's package management design. Package managers have different semantics. Trying to force multiple package managers to share the same cloud-config keys forces us to attempt to coerce the semantics to look identical when they are clearly different (see the bug that this fixes as evidence). The current design sacrifices elegance, user control, and intuitive design in attempting to meet its goal of a common UI for different package managers. Examples:
Perhaps, but how much new complexity is that, really? A new cloud-init user that uses Ubuntu is likely to know that apt and snap are their package managers. If they try to search for the snap or apt cloud-config module in the documentation, they will be surprised to find that telling cloud-init to update snap or apt packages has nothing to do with either the apt or the snap module. in some cases this will simplify the configuration as it is to complicate it or add verbosity: consider a user that doesn't want snap packages upgraded but does want a ppa added and packages package_update: true
package_upgrade: true
packages: [neovim]
snap:
commands:
00: snap refresh --hold=forever
apt:
sources:
neovim:
source: ppa:neovim-ppa/stable versus the hypothetical: apt:
update: true
upgrade: true
packages: [neovim]
sources:
neovim:
source: ppa:neovim-ppa/stable The second one groups elements related to apt, follows the rule of one top-level key per module, and is no more verbose than the first - much cleaner. Compare this to the added complexity of a user that just wants both apt and snap packages upgraded. currently: package_update: true
package_upgrade: true versus the hypothetical: snap:
upgrade: true
apt:
update: true
upgrade: true It is three more lines of configuration, but under the hypothetical one, a user that understands Ubuntu packaging can read it and intuitively understand what is supposed to happen - the added verbosity actually adds value by making the expected behavior more intuitive. Another point to consider is that most users are likely to know their distro better than they know cloud-init. Cloud-init's abstraction layer requires the user to learn a new set of semantics from the package managers that they already know. For example, what does I think that the point about "specialized knobs per package manager adding complexity" misses the fact that having a cloud-config that maps more cleanly to package manager behavior actually reduces complexity, because it would remove the burden of mapping different semantics of each package managers from cloud-init's code - making documentation simpler and cloud-init configuration much more intuitive since it would simply be expose behavior of the package managers which the user already knows.
I didn't recommend keeping the old keys[2]. Anybody that requires support for multiple distros could still accomplish the same behavior using jinja templates. Anybody that wants to use multiple distros likely needs to do this anyways due to the aforementioned package naming differences, which undermines the benefit promised by a generalization layer. Alternatively one could just include cloud-config for multiple package managers and cloud-init would interact correctly with whichever ones are installed.
I don't think we have real statistics to quote on which use cases are actually covered, and anyone who cares about boot speed but doesn't need snaps may want to do this. Part of the value of cloud-config is its (mostly) declarative nature, but that value is undercut because going back to a bash script is required frequently for trivial things such as wanting apt packages upgraded but not snap.
The current design is opinionated, but I wouldn't call it tasteful. I still think that we can make it better: both opinionated and tasteful.
That's what it does currently, but I'm still not convinced that this is what it should do. It's not intuitive. It's not beautiful. It's not simple. It's not easily maintainable.
I'm still fine with merging this as a short term workaround, but I also still think that we can design something better, which is why I was suggesting that we continue this conversation in a separate issue in my last comment:
The mere existence of this bug and workaround is yet another signal that the abstraction layer design concept is flawed and could use a re-think. Distros and package managers are different under the covers, and attempting to pretend otherwise yields code that has higher maintenance cost and more complex behavior which complicates user documentation and makes cloud-init harder to use and debug. [1] as proposed in this PR:
[2] they violate the "top level key per module" design |
I disagree with this...quite strongly actually. In a perfect world where we can perfectly design everything, no bash scripts are needed. But we're not perfect, and it's important to have escape hatches. Does the use of a trivial runcmd indicate a gap in cloud-init? Yep! But if we had taken 5 years to perfectly design and implement cloud-init, it would have been DOA while still missing half of the functionality people actually need. It's easy to see flaws in designs after the fact, but if I think back to when the package manager to distro relationship was 1:1 and all package managers essentially performed the same operations, the current design seems reasonable to me. I agree that your current design proposal is better given the introduction of snaps, but at what cost? We have the actual coding and the evitable breakage we'd introduce, adding new top-level keys, deprecating the old ones, and supporting both for a significant (5+ years ?) time-period. The end result being that package management may be slightly easier to grok for new users, and we can update apts and snaps independently. To me that comes out as a lot of work and forcing change upon existing users for little benefit. To echo @blackboxsw :
This workaround is to accommodate a use case where snap is installed but being intentionally blocked by a firewall. The only reason we're adding this workaround is because we changed existing behavior that we now can't change back. If this had been released to devel only, we would have marked the issue as "won't fix". That said, I'm ok adding your design proposal as an enhancement issue, but unless something else changes in the package management landscape, I don't see it as a priority.
Has this ever been a design goal of cloud-init? Is it currently? If so, that is news to me. |
@TheRealFalcon Which part specifically do you disagree with? I agree with every statement in your last comment, but I don't see any contradiction with mine so I don't understand the source of disagreement. There are only two things that I have said we should do:
It sounds like there is agreement on the first point, and I'm not hearing objections on the second either, nor with the design proposal. Cloud-init has made many design compromises over the years in order to ship code and add value, and I want for us to iterate on the principles of design so that the design of future features are approached with new insight - whether that is a redesigned package management UI or something else that is deemed more important.
Design goal? There is no stated design goal for this that I'm aware of. Many of the newer configuration modules in cloud-init were designed this way, and avoiding the top level keeps things tidier. I can write it down somewhere in the docs as a preference for future modules if you want, but I thought that it went without saying that configuration modules directly exposing their specific configurations at the global namespace is less organized and more prone to future name conflicts. What are your thoughts, is it better, worse, or indifferent for a configuration module to populate the top level with multiple keys? There are pros and cons to what I proposed above and the way it is currently, and ultimately the project needs to provide value to users so code needs to ship. I've said multiple times that I'm fine with doing exactly that - lets ship this code. Let me know if I should file the issue. |
In that particular line quote, that the value of cloud-config is undercut by bash scripts. I think it's actually the opposite. The bash scripts are the escape hatches that people need. But to the larger point, your previous large post argues that the current solution is subpar and what the ideal solution should look like. While I agree on those points, I think the current solution is good enough that I don't think it's worth the burden to give it any kind of priority. Perhaps I misread your intent, but given how much was said, it seemed you were arguing that this is a task worth doing sooner than "when convenient", which historically for this project has meant basically never.
I'm ok with both of these as long as we're on the same page about priority.
Yeah, I agree that is bad. The wording of "rule of one top-level key per module" came off as stricter to me than you probably intended. I was thinking that in your ideal cloud-config, every module has exactly one corresponding top-level key and vice versa. If that's not what you meant then we're probably on the same page. |
I think that maybe you misunderstood what I was saying. I didn't say that the value of cloud-config is undercut by bash scripts. I said that the value of cloud-config is undercut because going back to a bash script is required frequently for trivial things i.e. cloud-config doesn't implement basic features and so people are forced to use bash scripts for these basic features - which might make one wonder why they should bother with cloud-config at all.
100% agreed. We need them, because basic features in cloud-config are missing. We need them, which decreases the value of cloud-config because it forces reliance on a secondary (non-imperative) paradigm. Cloud-config would be more valuable if the user wasn't forced to use bash to work around missing features.
I didn't mean sooner than when convenient. @blackboxsw had already mentioned priority and I hadn't objected - I just wanted to illustrate the shortcomings of the current design since a defense for it had been presented.
+1
I didn't mean to imply that multiple top level keys should never happen in the same module. I just think that it's a cleaner design and is how I would recommend design future modules unless there is a good reason not to. |
I agree that defining a common cloud-config key for various package managers does sacrifice user control by providing a more generic/opinionated approach to package manager interactions.
Good point, though our opinionated general design involves making decisions for each package manager as to what the default should mean for that package mgmt backend. In this case, our package update for snap NOOPs because default behavior of snap as a package manager is to refresh those packages per any --hold requests or configuration. For apt/yum/zypper it means to invoke the
We have separate abstractions for apk repos/apt repos/yum repos because some of the configuration options exposed are a 1:1 replacement for repo-specific configuration syntax, publication streams or named keys. Given that the repository setup is tightly coupled to a repo configuration syntax, that generalization doesn't make as much sense as say, installing a named package.
This is not ideal that we have a workaround to offer alternatives to avoid changing behavior on stable downstreams as you've mentioned (and we all agree there). But in this particular fix case, the investment in delivering a solution that surfaces specific knobs per package manager is not worth the prioritization of that work for a corner-case where the package manager is installed, active, but disabled via external firewall rules that cloud-init has no visibility to.
While this is true, there is only one cloud-config key that admins would need to know in order to install a named package on any distro "packages" . They may need to know that the package is named X on Alpine environments or Y on deb-based environments and Z on Rocky, but they wouldn't need to also alter apk/apt/yum/zypper keys. I understand that you are saying it doesn't buy them much, because they still have to know the specifics of the package they are installing, so why not clearly represent the package manager they know they are talking to as well.
Not much new complexity in cloud-init code, just a smidge complexity in admin's having to state what their target package manager is when deciding to install/remove/upgrade packages. For packages which happen to have the same name in multiple distros they now have to represent:
vs
vs
Yes, but instead they find a generic "Package Update Upgrade Install" which applies to any supported distro. I'm even wondering if we'd maybe want to reorder that module name to "Package Install Update Upgrade" because I'd presume most searches in cloud-init would be something like "install software" "package install" "rpm install" etc.
I agree this is busy cloud-config and not ideal for this use case. But, I suggest this this an image that doesn't meet the user's need because it has snap installed and active which they ultimately want disabled.
Agreed,yet I think this sentiment applies to using cloud-init's user-data for any operation. Admins trying to drive config changes to a fleet of VMs as they are launched will still have that learning curve of how to express that operation in cloud-config even if we are representing generalized operations that look appropriate under the scope of a package manager specific top-level key. They'll still have to learn to lay the update/upgrade/packages under a top-level apt/yum/zypper/apk key.
Yes, this is a "workaround" for what I would consider a case upstream cloud-init should not add to mainstream support yet. I think this conversation is one that warrants a separate upstream issue as you have offered so we have have a more detailed discussion in a focused setting outside the scope of this workaround. It's worth extended discussion on this and the general goal of cloud-init cloud-config primitives. What is our goal here and how do we we balance the tension of exposing environment specific details in user-data versus generalized operations for features which cloud-config user-data supports. I can justify the differences we see in repo setup modules for yum/zypper/apt/apk formats warranting these separate keys as there are significantly separate behaviors and expressions of those config switches. For package managers, maybe we are taking too shallow a view in presuming that the majority of operations are similar enough to justify a general approach to their interaction.
+100
I meant, two ways to do things being that you have to speciffy
Yes those are not real statistics for this case, I'm trying to say if we have a case where we think it meets the majority of users we should provide the best general approach for the environments we support. If we think that we are not accommodating enough of a significant community need, then we can prioritize that work to make sure it is better expressed in cloud-config.
Yes, but we also can't accept every feature gap that cloud-config has and craft a fully expressive solution that is inclusive of all cases. We have to bear in mind that we don't have the bandwidth to "achieve all the things". If filing an issue helps at least capture that gap for further discussion, I'm on board. But, whether it will result in a near-term solution to incorporate this feature, I am uncertain about priority and bandwith available to meet this need.
If referring to this PR I'd agree, providing a workaround for something already introduced that we can't back out because it will break other consumers is not beautiful, simple or intuitive. So I can agree there, What is not simple or intuitive about cloud-init having awareness enough to interact with a package manager that is installed and available if presented with an declared operation in cloud-config? If cloud-init grew flatpak support and an image contained flatpak commands, I'd want my
Yes let's do this on a separate issue. As we are seeing there are differing perspectives here on what is considered 'best' for the user. |
Yes, but again, I think that individuals using the same cloud-config on different distros is a minority use case, not a majority one. This abstraction layer benefits (barely - alternative methods to do the same thing exist as previously mentioned) a minority user group while sacrificing intuitive and elegant design, simplicity of implementation, etc.
This is behavior that a user may want to choose at runtime via cloud-config. It would be more user-friendly for cloud-config to design a more expressive interface than to expect users to build images without snap installed.
I see what you're saying, but this isn't two ways to do something. These are two different things, after all. The costs associated with multiple implementations doesn't apply to this: it wouldn't require some sort of prioritization or conflict resolution logic (the cost of duplicate methods to code implementation/maintenance). I don't think that the costs normally induced by multiple implementations apply to the proposed design - yet the prioritization / conflict resolution logic certainly has to exist under the current abstraction.
Package managers are a key point of differentiation from one distro to the next. A paper-thin abstraction layer which attempts to make them all behave the same way is an exercise in busywork - and a recipe for confusion in the case of snap on Ubuntu. I don't think that very many package manager use cases are covered at all by this abstraction - outside of wanting a system to be up to date.
We're on the same page about that. This is just a really basic and simple feature which is much harder to solve than it should be - and the cause for this pain can be traced to the current abstraction layer UI and the snap implementation that was bolted onto it despite the semantic mismatch.
Simply installing from a list using a single package manager is simple, sure. But that's not where we are at today. We have two package managers on Ubuntu but try to use the same old user interface. Cloud-init cannot make the best decision for the user when multiple package managers are installed without extra information. How is cloud-init supposed to do the right thing when only a single Cloud-init deals with the single-package-list problem by making its list of packages contain not just package names but overload this optionally with the name of a package manager so that the user can specify which manager to install a package with. packages:
- snap: # why is this a member of the list and not an named object / dict?
- certbot
- [juju, --edge]
- [lxd, --channel=5.15/stable]
- apt: [mg] This is needlessly complex as it yet again requires the user to break out of the abstraction in order to even just tell the system to try to install a package using one package manager - the simplest of tasks. If we had wanted to maintain backwards compatibility with this abstraction layer while also following the existing patterns in this module, then we could done this to allow users to specify which package manager they want to update and upgrade: package_update: true
package_upgrade:
- apt: true
- snap: false But yet again we see in an example like this that users need to specify a package manager-specific detail despite the abstraction layer supposedly providing value.
I wouldn't. Security concerns aside, as a user there are knobs unique to flatpak that I would probably want exposed to me without having to switch back and forth between abstracted and dedicated interfaces. Trying to shove yet another package manager with different semantics into the same user interface would likely introduce a whole other set of issues (just like this one, which resulted from adding snap support under the same UI). I'd prefer a clean new module - free of the historical baggage and bugs of the legacy UI. |
-- 07102024: updated proposed commit message, reverted behavior to only honor refresh.hold=forever and updated example doc for temporary snap hold.
Proposed Commit Message
Additional Context
#5290
Test Steps
Checklist
Merge type